Skip to content

fix: export operator call instantiations#623

Open
voltjia wants to merge 2 commits into
masterfrom
fix/export-call-instantiations
Open

fix: export operator call instantiations#623
voltjia wants to merge 2 commits into
masterfrom
fix/export-call-instantiations

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 28, 2026

Summary

  • Reverts the public infini::ops::functional layer added by feat: add public C++ operator API #618.
  • Generates explicit Operator<Op>::Call template instantiations into libinfiniops.so and generated extern template declarations for C++ consumers.
  • Restores #include <infini/ops.h> as the public C++ entrypoint for existing operator classes and adds an external C++ smoke test for infini::ops::Add::Call.

Motivation

Closes #593

Downstream C++ consumers should not need backend kernel headers or vendor compilers just to call existing Operator<Op>::Call APIs. Explicit template instantiation keeps the existing operator class API and moves backend-dependent instantiation into libinfiniops.so, avoiding the extra functional wrapper layer.

Rebase Status

  • Rebased onto latest master: 42738491b4306f247c59eba24aaffc1001885cdb (ci: stabilize iluvatar runner and test images (#625)).
  • Current head: e870e3e74b196226ec80cd8ff9f843f838a70f39.

Type of Change

  • N/A — feat: this does not add a new feature/operator/platform.
  • fix — bug fix.
  • N/A — perf: no performance-path change.
  • N/A — refactor: not a behavior-neutral restructuring only.
  • N/A — test: includes tests but is not test-only.
  • build / ci — build system/codegen/linkage configuration.
  • N/A — docs: not documentation-only.
  • N/A — Breaking change: this restores the existing Operator<Op>::Call surface rather than introducing an ABI break.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Full Platform Test Results

All accelerator runs used card 6. Commands installed the package first, then ran bare pytest with no test-path or device arguments.

Platform Build pytest result Device / Notes
NVIDIA Yes 9207 passed, 8665 skipped, 81 warnings in 342.43s ssh nvidia, Docker --gpus "device=6", in-container CUDA_VISIBLE_DEVICES=0
MetaX Yes 8699 passed, 7655 skipped, 81 warnings in 399.34s ssh metax, CUDA_VISIBLE_DEVICES=6
Iluvatar Yes 7705 passed, 8649 skipped, 81 warnings in 582.92s ssh iluvatar, CUDA_VISIBLE_DEVICES=6
Moore Yes 8472 passed, 7900 skipped, 99 warnings in 618.44s ssh moore, MUSA_VISIBLE_DEVICES=6; required LD_PRELOAD=/usr/local/musa-4.3.1/lib/libomp.so because /usr/local/musa/lib/libomp.so does not export __kmpc_for_static_fini
Cambricon Yes 5900 passed, 10070 skipped, 172 warnings in 978.34s ssh cambricon, MLU_VISIBLE_DEVICES=6, command exit status 0
Ascend Yes 7398 passed, 8914 skipped, 71 warnings in 632.45s ssh ascend, ASCEND_RT_VISIBLE_DEVICES=6; pytest reached a passing summary, but the outer Docker command wrote exit status 137 after completion, so this is recorded as an environment/teardown anomaly rather than a clean command pass
Validation commands
# Rebase and push
ssh nvidia 'cd /home/huangjiacheng/infiniops-pr623-rebase && git fetch origin master && git rebase origin/master'
ssh nvidia 'cd /home/huangjiacheng/infiniops-pr623-rebase && git push --force-with-lease origin fix/export-call-instantiations'

# NVIDIA
python -m pip install .[dev] --no-build-isolation && pytest

# MetaX
pip install .[dev] --no-build-isolation && pytest

# Iluvatar
python -m pip install packaging exceptiongroup typing-extensions pygments pybind11 libclang
python -m pip install . --no-build-isolation --no-deps && pytest

# Moore
export LD_PRELOAD=/usr/local/musa-4.3.1/lib/libomp.so
pip install .[dev] --no-build-isolation && pytest

# Cambricon
pip install .[dev] --no-build-isolation && pytest

# Ascend
pip install .[dev] --no-build-isolation && pytest

Benchmark / Performance Impact

N/A — this PR changes build/codegen/linkage for C++ operator calls, not operator kernels or performance paths.

Notes for Reviewers

  • Public C++ consumers should include <infini/ops.h> to get the generated extern template declarations before calling infini::ops::<Op>::Call.
  • The generated instantiation sources include backend marker and implementation headers, so downstream consumers no longer need backend kernel headers for the covered Call signatures.
  • Operator::Call now takes const Args&... to make the instantiation signature stable across lvalue/rvalue call sites.
  • Ascend produced a complete passing pytest summary, but the outer Docker process returned 137 after pytest completed. The test result is included for review transparency instead of being marked as a clean command pass.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type.
  • Each commit message follows Conventional Commits.
  • Every commit is meaningful, well-formed, and independently reviewable.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal and scoped to reverting functional plus exporting Call instantiations.
  • No dead code, debug prints, or unowned TODOs were added.
  • No unrelated formatting churn was introduced.
  • Public API changes are intentional and covered by the external C++ smoke test.

General Code Hygiene

  • Comments were added only where the behavior is non-obvious.
  • Modified and added files end with trailing newlines.
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages use backticks where applicable.
  • Comments and error messages are in English.
  • Comments and error messages follow project conventions.

C++ Specific

  • Code follows the Google C++ Style Guide.
  • clang-format --dry-run --Werror include/infini/ops.h src/operator.h passed on ssh nvidia in the PR creation pass.
  • N/A — clang-tidy was not run in this pass; this PR does not add new kernel logic.
  • N/A — operator parameter order is unchanged.
  • No exceptions are thrown.
  • No new error or warning messages were introduced.
  • N/A — no kernel files were added or renamed.
  • N/A — no kernel/kernel launcher split was changed.
  • N/A — no constructor initializer lists were changed.
  • Namespace and spacing in touched C++ headers are formatted by clang-format.
  • N/A — no new operators were added.
  • No raw new/delete was introduced.

Python Specific

  • ruff check scripts/generate_wrappers.py tests/test_cpp_api.py passed in the PR creation pass.
  • ruff format --check scripts/generate_wrappers.py tests/test_cpp_api.py passed in the PR creation pass.
  • Comments and strings follow surrounding Python conventions.
  • Framework-specific pytest.skip conventions are preserved.
  • Function-body spacing follows the surrounding style.
  • Control-flow spacing follows the surrounding style.
  • Return statement spacing follows the surrounding style.
  • N/A — no docstrings were added.
  • N/A — the touched generator/test code follows the existing no-type-hints style.

Testing

  • Full bare pytest was run on all six supported accelerator platforms: NVIDIA, MetaX, Iluvatar, Moore, Cambricon, and Ascend.
  • All platform results are listed in the table above, including the Ascend outer-process 137 teardown anomaly.
  • New functionality has a regression smoke test under tests/.
  • N/A — the new smoke test does not need parametrization.
  • N/A — pytest.mark.auto_act_and_assert is not applicable to this external compile/link smoke test.
  • N/A — dtype/device parameterization is not applicable to this external compile/link smoke test.
  • N/A — no new flaky test behavior was observed in the passing pytest summaries.
  • The external Add::Call smoke fails on the previous behavior with an empty backend dispatch and passes with this PR.

Build, CI, and Tooling

  • pip install/wheel build was run as part of every full-platform pytest command above.
  • N/A — compile_commands.json regeneration was not separately checked.
  • N/A — no new backend/device was added.
  • Existing CUDA-like backend mutual-exclusion logic was not changed.
  • Local equivalents of clang-format.yml and ruff.yml checks passed for touched files in the PR creation pass.
  • No new runtime dependency was added.

Documentation

  • N/A — README/CONTRIBUTING updates are not required for this internal codegen/linkage fix.
  • N/A — no new operator or public utility API was added beyond restoring <infini/ops.h> as the entrypoint.
  • N/A — no breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers were committed.
  • N/A — no third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia force-pushed the fix/export-call-instantiations branch from ca40c6f to e870e3e Compare May 30, 2026 01:49
@voltjia voltjia marked this pull request as ready for review May 30, 2026 04:17
@voltjia voltjia requested review from a team, Ziminli and crapromer May 30, 2026 04:17
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 30, 2026

@crapromer 初审,@Ziminli 终审。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export C API from libinfiniops.so to eliminate vendor compiler requirement for downstream consumers

1 participant